🐿️ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
📃 Manuscript Tokenization

Medieval Text Processing, Paleographic Parsing, Historical NLP, Character Segmentation

Unfolding the Past: A Comprehensive Deep Learning Approach to Analyzing Incunabula Pages
arxiv.org·1d
🤖Manuscript AI
davidchisnall/igk: I got Knuth'd: A compiler for documents
github.com·8h
📝Concrete Syntax
Why Your Next LLM Might Not Have A Tokenizer
towardsdatascience.com·18h
🤖Grammar Induction
Building an ML model to generate fonts
fontweaver.com·1d·
Discuss: Hacker News
🔠Terminal Fonts
The modern text processing pipeline: Overview
newroadoldway.com·1d·
Discuss: Lobsters, r/programming
🔤Unicode Normalization
Unveiling Factors for Enhanced POS Tagging: A Study of Low-Resource Medieval Romance Languages
arxiv.org·1d
👁️Medieval OCR
Kumo Surfaces Structured Data Patterns Generative AI Misses
thenewstack.io·15m
📊Graph Databases
Using Wavelets and Clustering to Predict Odd or Even Numbers: An Overengineered Approach with Pretty (But Confusing) Plots
dev.to·32m·
Discuss: DEV
🧠Machine Learning
Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated
arxiv.org·10h
🧮Kolmogorov Complexity
June 25, 2025 Flight Tracking Workshop (4 hour) [Americas / Europe-friendly time]
bellingcat.com·14h
🧮Prolog Parsing
Using an LLM for query planning in RAG –> 40% better answer relevance
techcommunity.microsoft.com·18h·
Discuss: Hacker News
🔍Information Retrieval
Capturing my handwriting in a searchable digital format – the long way round
colinramsay.co.uk·1d·
Discuss: Hacker News
📲Digitization
Text2Struct: A Machine Learning Pipeline for Mining Structured Data from Text
arxiv.org·1d
🔤Character Classification
The Internal Inconsistency of Large Language Models
blog.kortlepel.com·21h·
Discuss: Hacker News
💻Local LLMs
Portable Network Graphics (PNG) Specification (Third Edition)
w3.org·17h·
Discuss: Hacker News
🕸️WebP Analysis
The Bitter Lesson is coming for Tokenization
lucalp.dev·1d·
Discuss: Lobsters, Hacker News, r/programming
🔗Monadic Parsing
Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast
news.ycombinator.com·1d·
Discuss: Hacker News
🌀Brotli Internals
LR(1) parse-tables generator
github.com·1d·
Discuss: Lobsters, Hacker News
🔍Z3 Parsing
Best Ways to Translate Documents Online Using AI – Secure OCR, Layout Retention, and Top Tools Compared
dev.to·3d·
Discuss: DEV
🤖AI Translation
QuranMorph: Morphologically Annotated Quranic Corpus
arxiv.org·1d
📋Document Grammar
Loading...Loading more...
AboutBlogChangelogRoadmap